Classification and Retrieval from Mailing Lists and Forums

نویسندگان

  • Preethi Raghavan
  • Shajith Ikbal
  • Nanda Kambhatla
چکیده

In this paper, we address the tasks of classifying the posts of mailing lists and forums, and retrieving relevant information from the same (describing our participation in the FIRE 2010 evaluation sub-tasks for the same). We approach the classification problem in two ways. In the first method, we pose it as a sequence labeling task using conditional random fields (CRF), where the entire thread information is used to find the labels for each of the posts in that thread. In the second approach, we model it as a document classification task considering each post as an independent document and using a support vector machine (SVM) like classifier. We discuss the performances of both of these approaches, and also, the effect of different features that we used for training the CRFs. For the retrieval task, the approach we propose is to combine the text match score with a goodness score of the thread. This score is a measure of the quality of the resolution text contained in the thread. A classifier is employed to identify posts of the thread that could be a solution, and the goodness score is computed based on its contents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research Paper Clustering of Mailing List Discussions for Information Retrieval

Recently, threaded discussion communities, such as web forums and mailing lists, have become increasingly popular with Internet users. Millions of registered members converse about leading-edge scientific issues or just about trivial matters of their every day life. This makes data bases of these communities a valuable source for information, but their colloquial texts are hard to search and su...

متن کامل

Mailing Lists: Why Are They Still Here, What’s Wrong With Them, and How Can We Fix Them?

Mailing lists have existed since the early days of email and are still widely used today, even as more sophisticated online forums and social media websites proliferate. The simplicity of mailing lists can be seen as a reason for their endurance, a source of dissatisfaction, and an opportunity for improvement. Using a mixed-method approach, we study two community mailing lists in depth with int...

متن کامل

FLOSS developers committing to CVS / SVN as much as they are talking in mailing lists ? Challenges for Integrating d ata from Multiple Repositories ∗

This paper puts forward a framework for investigating Free and Open Source Software (F/OSS) developers activities in both source code and mailing lists repositories. We used data dumps of fourteen projects from the FLOSSMetrics (FM) retrieval system. Our intentions are (i) to present a possible methodology, its advantages and disadvantages which can benefit future researchers using some aspects...

متن کامل

Supporting Comparison of Developer Profiles across Online Communities

Current software development practices leave a plethora of activities that are archived in version control systems, issue trackers, mailing lists, or Question and Answer (Q&A) forums. Software managers are increasingly using these online activities to better evaluate job candidates. We introduce our tool, Visual Resume, that displays visual overviews of developers’ contributions in code sharing...

متن کامل

Opinion Mapping: Information Visualization Approaches for Comparative Sentiment Analysis

In this position paper, we discuss the problem of extracting information about chronic diseases from the large volume of text written in health blogs, mailing lists, forums, and other electronic venues, then making this information accessible via structured queries, while analyzing it to map out patterns among the opinions and demographics of users. Information retrieval systems exist for spati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010